0.1 Introduction

Very often in Epi-reports, it’s quite necessary to produce tqbles that convey the information in the most efficiecnt way. In this lesson we will do that using R and the grammar of tables package.

0.2 Learning objectives

  • Use gt() to create simple table

    • Title and subtitle

    • Format percentages and round decimals

  • Conditional coloring

    • Numeric/continuous data

      • Scale to range of values in your data (for continuous/sequential range)

      • Discrete color scale (set numeric ranges for each color)

    • Discrete/categorical data

      • Color by text string/factor level (e.g., met, partially met, not met)
  • Format text by value (font color, bold,etc.)

  • Stratified tables (By age group or sex or both)

    • Preparing data for stratified table

    • Stub and stub head, spanner columns

  • External resources for further customization

    • Color palettes

    • Borders

    • kableExtra

0.3 Packages covered in this lesson

In this lesson, we will use the following packages:

  • gt

  • dplyr , tidyr , and purrr.

  • janitor

  • KableExtra

  • Paletteer , ggsci

0.4 Introduction to the dataset

We will use data from Malawi HIV Program during the four quarters of 2019, you can access the data yourself here.

First let’s import the the 4 datasets at once.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Second, let’s see how this data looks like :

data_united |> 
  glimpse()
## Rows: 17,235
## Columns: 29
## $ region                                 <chr> "Northern Region", "Northern Re…
## $ zone                                   <chr> "Northern Zone", "Northern Zone…
## $ district                               <chr> "Chitipa", "Chitipa", "Chitipa"…
## $ traditional_authority                  <chr> "Senior TA Bulambya Songwe", "S…
## $ facility_name                          <chr> "Kapenda Health Centre", "Kapen…
## $ datim_code                             <chr> "K9u9BIAaJJT", "K9u9BIAaJJT", "…
## $ system                                 <chr> "e-mastercard", "e-mastercard",…
## $ hsector                                <chr> "Public", "Public", "Public", "…
## $ period                                 <chr> "2019 Q1", "2019 Q1", "2019 Q1"…
## $ reporting_period                       <chr> "1st month of quarter", "1st mo…
## $ sub_groups                             <chr> "All patients (checked data)", …
## $ new_women_registered                   <dbl> 45, NA, 40, NA, 43, NA, 32, NA,…
## $ total_women_in_booking_cohort          <dbl> NA, 55, NA, 44, NA, 37, NA, 43,…
## $ not_tested_for_syphilis                <dbl> NA, 45, NA, 19, NA, 5, NA, 43, …
## $ syphilis_negative                      <dbl> NA, 10, NA, 25, NA, 32, NA, 0, …
## $ syphilis_positive                      <dbl> NA, 0, NA, 0, NA, 0, NA, 0, NA,…
## $ hiv_status_not_ascertained             <dbl> 4, 7, 9, 4, 9, 5, 3, 3, 4, 26, …
## $ previous_negative                      <dbl> 0, 0, 0, 0, 0, 1, 3, 5, 1, 3, 0…
## $ previous_positive                      <dbl> 0, 0, 0, 1, 1, 0, 1, 1, 1, 3, 2…
## $ new_negative                           <dbl> 40, 47, 30, 38, 33, 31, 25, 34,…
## $ new_positive                           <dbl> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0…
## $ not_on_cpt                             <dbl> NA, 0, NA, 0, NA, 0, NA, 0, NA,…
## $ on_cpt                                 <dbl> NA, 1, NA, 2, NA, 0, NA, 1, NA,…
## $ no_ar_vs                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ already_on_art_when_starting_anc       <dbl> 0, 1, 0, 1, 1, 0, 1, 1, 1, 3, 2…
## $ started_art_at_0_27_weeks_of_pregnancy <dbl> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
## $ started_art_at_28_weeks_of_preg        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ no_ar_vs_dispensed_for_infant          <dbl> NA, 0, NA, 0, NA, 0, NA, 0, NA,…
## $ ar_vs_dispensed_for_infant             <dbl> NA, 1, NA, 2, NA, 0, NA, 1, NA,…

For the sake of convenience, we will summarize the data by quarter and region, and keep only

summarized_data <- data_united |> 
  group_by(
    zone,
    period
  ) |> 
  summarise(
    across(
      .cols = c(
        previous_negative, 
        previous_positive, 
        new_negative, 
        new_positive,
        hiv_status_not_ascertained,  
      ),
      \(x) sum(x, na.rm = T) 
    )
  )
## `summarise()` has grouped output by 'zone'. You can override using the
## `.groups` argument.

0.5 Creating simple tables

0.5.1 The grammar and components of a table

`gt` is an R package that produce publication-ready table, this package is based on an idiom called the grammar of tables that allows the user to describe the components of any table in a consistent manner, think of it as the ggplot2 for tables. So, in order to start using the package we need to understand the basics of this grammar.

As seen in the figure from the package website, the `gt` package considers any table with the following components:

  • he Table Header (optional; with a title and possibly a subtitle)

  • the Stub and the Stub Head (optional; contains row labels, optionally within row groups having row group labels and possibly summary labels when a summary is present)

  • the Column Labels (contains column labels, optionally under spanner column labels)

  • the Table Body (contains columns and rows of cells)

  • the Table Footer (optional; possibly with footnotes and source notes)

So now we will use this knowledge and combine it the syntax of the package to actually make tables you can be proud of.

0.5.2 A simple table

To create a simple table from the data we got, we can simply call the gt() function :

summarized_data |> 
  gt()
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975

You can see already that the table is quite raw , a bit more presentable than the output in R’s console, but also two letters away compared to what it’s required to produce it in excel.

0.5.2.1 Adding details to the table

We need to add more details to the table, like a title and subtitle, we can do that simply by using the function tab_heade: and specify the title and subtitle arguments, we can also add the source of the data in a footnote:

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program")
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

0.5.2.2 Formatting the values in the table

Great, now we know how to make a simple gt table and more details to it. However, since we got a relatively large table with different kind of information it can be useful to use some color scaling to add some explanatory visual effect, say for example we want the cells with highest values in the new_positive column to be different from the ones with lowest values, this, can be done in few lines of code:

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  )
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

This new function gt::data_color seems a bit intimidating at first, but the logic is straightforward:

  • we specify the column we want to format it’s colors in the columns argument. We can specify more than one column if the formatting is the same for each of them using c(…).

  • we specify a palette of colors, the content of which should be characters in the hexadecimal format of color identification (eg : “#FFEBED99”). Fortunately we don’t have to do this manually, although we could, we use the paletteer package to determine these values.

  • The paletteer package accepts value from other coloration packages, in our case we used ggsci . We defined the number of color shade to use(n = 10) and we passed all that to as.character to make sure that the vector of color values to be passed to the data_color function is a vector of characters eventually.

We can do this for the new_negative column for example, we can use a different kind of palette, I’m using for this case the green palette from the same package: ggsci::green_material , you can find all the palettes included in the paletteer package in here.

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  ) 
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

We can also set up the table to conditionally change the color of the text in the table depending on the value of that text. In this following case we wanted to highlight values in the column previous positive according to a threshold, if the value is greater than 2000 then the text color should be red, if it’s less than 2000 then the text color should be green, we also added the styling bold to the text as well. It’s the same process, the only difference is that we specify the condition of the formatting in the locations function using the row argument.

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  ) |> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  )
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

0.5.2.3 Fonts and text

Now is a good time to add some customization to the text in the table. We can do that via the function gt::tab_style. We can use this function not only to change the values in table’s body, but also to any other part of it as well.

Let’s change the font and color of the title and the subtitle for example, I’m choosing to use the Yanone Kaffeesatz font from google. Google fonts provide you with hundreds of thousands of fonts and styles to choose from that can be more interesting than the boring rigid excel fonts.

In order to do that, we need to specify some details in gt::tab_style function:

  • We assign a list to the argument style.

  • In that list we specify that we are editing text (i.e table specific values, not borders or shapes) using the function cell_text.

  • Inside cell_text we specify the details we want, i.e the font and the color

  • Finally, we add another argument to the tab_style function that depict the location of these changes or specifications, in our case it’s the title and the sub_title, so we assign list to the locations argument that contains the function cells_title which specifies the location of these changes we’ve done using the syntax below.

  • Note that in order to make changes to the appearance of either the title or subtitle, you can simple use : locations = list(cells_title(groups = "title")) to apply changes to the title, or locations = list(cells_title(groups = "subtitle")) , to apply changes to the subtitle without the need to use c(…) .

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  ) |> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  )|> 
  tab_style(
      style = list(
        cell_text(
          font = google_font(name = 'Yanone Kaffeesatz'), 
          color = "#22668D"
        )
      ),
      locations = list(
        cells_title(groups = c("title", "subtitle"))
      )
  )
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

Additionally, we can conduct the same changes to the column labels and the rows labels as well, all we need to do is to correctly specify the location of the changes we want to make, except that this time we are changing the background color(or fill color) of the cells we are going to change. We can use that by adding another style function cell_fill where we provide the color we want for the background of our cells. lastly, in the locations argument, and similar to the style argument, we assign a list in which we provide the location information of the changes we want done using the cells_column_labels function, where we specify which column labels we want to change, in this case all of them, thus we pass the function columns = everything() .

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  ) |> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  ) |> 
  tab_style(
      style = list(
        cell_text(
          font = google_font(name = 'Yanone Kaffeesatz'), 
          color = "#22668D"
        )
      ),
      locations = list(
        cells_title(groups = c("title", "subtitle"))
      )
  ) |> 
  tab_style(
    style = list(
      cell_text(
        font = google_font(name = "Righteous"),
        color = "#57375D"
      ),
      cell_fill(color = "#F2E8C6")
    ),
    locations = list(
      cells_column_labels(columns = everything())
    )
  )
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

In a similar manner we can do the same thing to the group rows and the periods, all we need to do is add them to locations argument using cells_rows_groups for the group rows, and cells_body for the rest of the period column as follows:

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  ) |> 
  tab_style(
      style = list(
        cell_text(
          font = google_font(name = 'Yanone Kaffeesatz'), 
          color = "#22668D"
        )
      ),
      locations = list(
        cells_title(groups = c("title", "subtitle"))
      )
  ) |> 
  tab_style(
    style = list(
      cell_text(
        font = google_font(name = "Righteous"),
        color = "#57375D"
      ),
      cell_fill(color = "#F2E8C6")
    ),
    locations = list(
      cells_column_labels(columns = everything()),
      cells_row_groups(groups = everything()),
      cells_body(columns = period)
    )
  )
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program

The idea behind what we’ve done here is to give you control on what YOU want to achieve and not an example of what you have to do exactly, there’s endless ways to customize a gt table, it’s up to you to choose what you need, and what works for your workflow.

0.5.3 Stratifying tables by groups

0.5.3.1 Spanner columns

Spanners are very useful to include in a table, they basically help us read the table and contextualize the information within by grouping columns together. In our case for example we have two groups, previous and new HIV test outcomes.

Since 4 of our columns are conveniently named: [previous/new _ negative/poitive ] . We can use this to our advantage to easily create spanners using the delimitation ’_ to differentiate between the spanner label and the column label by means of the tab_spanner_delim function:

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  ) |> 
  tab_style(
      style = list(
        cell_text(
          font = google_font(name = 'Yanone Kaffeesatz'), 
          color = "#22668D"
        )
      ),
      locations = list(
        cells_title(groups = c("title", "subtitle"))
      )
  ) |> 
  tab_style(
    style = list(
      cell_text(
        font = google_font(name = "Righteous"),
        color = "#57375D"
      ),
      cell_fill(color = "#F2E8C6")
    ),
    locations = list(
      cells_column_labels(columns = everything()),
      cells_row_groups(groups = everything()),
      cells_body(columns = period)
    )
  ) |> 
  tab_spanner_delim(delim = "_", columns = 3:6)
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous new hiv_status_not_ascertained
negative positive negative positive
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program
  # This is another way to create the spanners as well
  # tab_spanner(
  #   label = "New Cases", columns = c(new_negative, new_positive)
  # ) |> 
  # tab_spanner(
  #   label = "Previous Cases", columns = c(previous_negative, previous_positive)
  # ) |> 

As you can see, the styling is off with the spanners, that’s because we didn’t specify any. We can simply change that in two ways, either we change the location of spanner line before the styling line(see the comments below) and then simply specify spanners in the locations argument using the same styling of the column and row labels. Or we can create a new styling just for the spanners. to keep things simple we will go with the first solution and change the location of the spanners code and add its location to the styling we already have.

summarized_data |> 
  gt() |> 
  tab_header(
    title = "Sum of cases of HIV in Malawi",
    subtitle = "from Q1 2019 to Q2 2019"
  ) |> 
  tab_source_note("Data from the Malawi HIV Program") |> 
  data_color(
    columns = new_positive,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 10)),
      domain = NULL
    )
  ) |> 
  data_color(
    columns = new_negative,
    fn = scales::col_numeric(
      palette = as.character(paletteer::paletteer_d("ggsci::green_material",n = 10)),
      domain = NULL
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "red",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive > 2000
    )
  )|> 
  tab_style(
    style = cell_text(
      color = "green",
      weight = "bold",
    ),
    locations = cells_body(
      columns = previous_positive,
      rows = previous_positive < 2000
    )
  ) |> 
  tab_style(
      style = list(
        cell_text(
          font = google_font(name = 'Yanone Kaffeesatz'), 
          color = "#22668D"
        )
      ),
      locations = list(
        cells_title(groups = c("title", "subtitle"))
      )
  ) |> 
  tab_spanner_delim(delim = "_", columns = 3:6) |> 
  # this is the styling code for the column and row labels
  tab_style(
    style = list(
      cell_text(
        font = google_font(name = "Righteous"),
        color = "#57375D"
      ),
      cell_fill(color = "#F2E8C6")
    ),
    locations = list(
      cells_column_labels(columns = everything()),
      cells_row_groups(groups = everything()),
      cells_body(columns = period),
      ## adding the spanners location
      cells_column_spanners(spanners = everything())
    )
  ) 
Sum of cases of HIV in Malawi
from Q1 2019 to Q2 2019
period previous new hiv_status_not_ascertained
negative positive negative positive
Central East Zone
2019 Q1 994 1156 47471 616 2824
2019 Q2 517 1036 42923 443 3209
2019 Q3 1097 1158 46700 534 3583
2019 Q4 595 1039 45807 399 1961
Central West Zone
2019 Q1 1568 2526 75547 1388 2131
2019 Q2 1322 2567 73520 1470 1804
2019 Q3 1548 2844 81099 1382 1657
2019 Q4 457 2715 78921 1292 1216
Northern Zone
2019 Q1 675 1197 36196 664 1126
2019 Q2 590 1084 35315 582 1301
2019 Q3 542 1191 36850 570 954
2019 Q4 346 1132 34322 519 747
South East Zone
2019 Q1 1583 5766 74926 1976 1454
2019 Q2 1672 5688 76937 1890 1566
2019 Q3 1910 5966 80067 1803 1243
2019 Q4 1504 5953 79454 1861 1385
South West Zone
2019 Q1 1775 4171 50554 1555 717
2019 Q2 1504 4726 53554 1747 1566
2019 Q3 1394 4640 55813 1618 928
2019 Q4 3391 4861 53118 1575 975
Data from the Malawi HIV Program
  # In case we want to have different styling for the spanners we can run this.
  # tab_spanner_delim(delim = "_", columns = 3:6) |> 
  # tab_style(
  #   style = list(
  #     cell_text(
  #       font = google_font(name = "Righteous"),
  #       color = "#57375D"
  #     ),
  #     cell_fill(color = "#F2E8C6")
  #     ),
  #   locations = list(
  #     cells_column_spanners(spanners = everything())
  #     )
  #   ) 

0.5.4 Bonus knowledge: KableExtra

As we’ve seen how flexible and powerful gt can be to make really nice looking tables(we didn’t even scratch the surface of the possibilities). One can also refer to KableExtra instead. It’s another R package that’s table-centered.

The simplest way to create a Kable table, is using the kbl() function

library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
summarized_data |> 
  kbl()
zone period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone 2019 Q1 994 1156 47471 616 2824
Central East Zone 2019 Q2 517 1036 42923 443 3209
Central East Zone 2019 Q3 1097 1158 46700 534 3583
Central East Zone 2019 Q4 595 1039 45807 399 1961
Central West Zone 2019 Q1 1568 2526 75547 1388 2131
Central West Zone 2019 Q2 1322 2567 73520 1470 1804
Central West Zone 2019 Q3 1548 2844 81099 1382 1657
Central West Zone 2019 Q4 457 2715 78921 1292 1216
Northern Zone 2019 Q1 675 1197 36196 664 1126
Northern Zone 2019 Q2 590 1084 35315 582 1301
Northern Zone 2019 Q3 542 1191 36850 570 954
Northern Zone 2019 Q4 346 1132 34322 519 747
South East Zone 2019 Q1 1583 5766 74926 1976 1454
South East Zone 2019 Q2 1672 5688 76937 1890 1566
South East Zone 2019 Q3 1910 5966 80067 1803 1243
South East Zone 2019 Q4 1504 5953 79454 1861 1385
South West Zone 2019 Q1 1775 4171 50554 1555 717
South West Zone 2019 Q2 1504 4726 53554 1747 1566
South West Zone 2019 Q3 1394 4640 55813 1618 928
South West Zone 2019 Q4 3391 4861 53118 1575 975

Additionally we can style it to look like a scientific table if we want to using the kable_classic function:

summarized_data |> 
  kbl() |> 
  kable_classic()
zone period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone 2019 Q1 994 1156 47471 616 2824
Central East Zone 2019 Q2 517 1036 42923 443 3209
Central East Zone 2019 Q3 1097 1158 46700 534 3583
Central East Zone 2019 Q4 595 1039 45807 399 1961
Central West Zone 2019 Q1 1568 2526 75547 1388 2131
Central West Zone 2019 Q2 1322 2567 73520 1470 1804
Central West Zone 2019 Q3 1548 2844 81099 1382 1657
Central West Zone 2019 Q4 457 2715 78921 1292 1216
Northern Zone 2019 Q1 675 1197 36196 664 1126
Northern Zone 2019 Q2 590 1084 35315 582 1301
Northern Zone 2019 Q3 542 1191 36850 570 954
Northern Zone 2019 Q4 346 1132 34322 519 747
South East Zone 2019 Q1 1583 5766 74926 1976 1454
South East Zone 2019 Q2 1672 5688 76937 1890 1566
South East Zone 2019 Q3 1910 5966 80067 1803 1243
South East Zone 2019 Q4 1504 5953 79454 1861 1385
South West Zone 2019 Q1 1775 4171 50554 1555 717
South West Zone 2019 Q2 1504 4726 53554 1747 1566
South West Zone 2019 Q3 1394 4640 55813 1618 928
South West Zone 2019 Q4 3391 4861 53118 1575 975

We can also add a caption to the table using the caption argument in the kbl() function, as well as the font and the general size of the table using respectively the html_font and full_width arguments of the kable_classic function.

summarized_data |> 
  kbl(caption = "Same table, different trick") |> 
  kable_classic(full_width = F, html_font = "Cambria")
Same table, different trick
zone period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone 2019 Q1 994 1156 47471 616 2824
Central East Zone 2019 Q2 517 1036 42923 443 3209
Central East Zone 2019 Q3 1097 1158 46700 534 3583
Central East Zone 2019 Q4 595 1039 45807 399 1961
Central West Zone 2019 Q1 1568 2526 75547 1388 2131
Central West Zone 2019 Q2 1322 2567 73520 1470 1804
Central West Zone 2019 Q3 1548 2844 81099 1382 1657
Central West Zone 2019 Q4 457 2715 78921 1292 1216
Northern Zone 2019 Q1 675 1197 36196 664 1126
Northern Zone 2019 Q2 590 1084 35315 582 1301
Northern Zone 2019 Q3 542 1191 36850 570 954
Northern Zone 2019 Q4 346 1132 34322 519 747
South East Zone 2019 Q1 1583 5766 74926 1976 1454
South East Zone 2019 Q2 1672 5688 76937 1890 1566
South East Zone 2019 Q3 1910 5966 80067 1803 1243
South East Zone 2019 Q4 1504 5953 79454 1861 1385
South West Zone 2019 Q1 1775 4171 50554 1555 717
South West Zone 2019 Q2 1504 4726 53554 1747 1566
South West Zone 2019 Q3 1394 4640 55813 1618 928
South West Zone 2019 Q4 3391 4861 53118 1575 975

We can also go for a more simple style for the table using the kable_material function

summarized_data |> 
  kbl(caption = "Same table, different trick") |> 
  kable_material(c('striped','hover'), html_font = "Cambria", full_width = F)
Same table, different trick
zone period previous_negative previous_positive new_negative new_positive hiv_status_not_ascertained
Central East Zone 2019 Q1 994 1156 47471 616 2824
Central East Zone 2019 Q2 517 1036 42923 443 3209
Central East Zone 2019 Q3 1097 1158 46700 534 3583
Central East Zone 2019 Q4 595 1039 45807 399 1961
Central West Zone 2019 Q1 1568 2526 75547 1388 2131
Central West Zone 2019 Q2 1322 2567 73520 1470 1804
Central West Zone 2019 Q3 1548 2844 81099 1382 1657
Central West Zone 2019 Q4 457 2715 78921 1292 1216
Northern Zone 2019 Q1 675 1197 36196 664 1126
Northern Zone 2019 Q2 590 1084 35315 582 1301
Northern Zone 2019 Q3 542 1191 36850 570 954
Northern Zone 2019 Q4 346 1132 34322 519 747
South East Zone 2019 Q1 1583 5766 74926 1976 1454
South East Zone 2019 Q2 1672 5688 76937 1890 1566
South East Zone 2019 Q3 1910 5966 80067 1803 1243
South East Zone 2019 Q4 1504 5953 79454 1861 1385
South West Zone 2019 Q1 1775 4171 50554 1555 717
South West Zone 2019 Q2 1504 4726 53554 1747 1566
South West Zone 2019 Q3 1394 4640 55813 1618 928
South West Zone 2019 Q4 3391 4861 53118 1575 975

The idea here is to show how many options you got to make a really nice table, the details of kableExtra are beyond the scope of this lesson, but it ’s important to be familiar with it along gt.

0.5.5 Wrap up

In this lesson we discovered how to create publication-ready tables in gt (and a bit less in kableExtra), we learned about the ideoms of the grammar of table, we formatted data to the proper format, used gt to create a table, and manipulated the components of that table to our need and style. This is not however an exhaustive list of what you can do with gt but a mere example on the potential of your final product should you choose to invest in learning the package.

0.6 External resources and packages